Hierarchical sampling for object identification

ABSTRACT

Aspects of the present disclosure include methods, systems, and non-transitory computer readable media that perform the steps of receiving a first plurality of snapshots, generating a first plurality of descriptors each associated with the first plurality of snapshots, grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors, selecting a representative snapshot for each of the at least one cluster, generating at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identifying a target by applying the at least second descriptor to a second plurality of snapshots.

BACKGROUND

The current application is a continuation application of U.S. patentapplication Ser. No. 17/061,262, entitled “HIERARCHICAL SAMPLING FOROBJECT IDENTIFICATION,” filed Oct. 1, 2020, which claims the benefit ofU.S.

Provisional Application No. 62/908,980, entitled “HIERARCHICAL SAMPLINGFOR OBJECT IDENTIFICATION,” filed on Oct. 1, 2019, the contents of whichare incorporated by reference in their entireties.

BACKGROUND

In surveillance systems, numerous images (e.g., more than thousands oreven millions) may be captured by multiple cameras. Each image may showpeople and objects (e.g., cars, infrastructures, accessories, etc.). Incertain circumstances, security personnel monitoring the surveillancesystems may want to locate and/or track a particular person and/orobject through the multiple cameras. However, it may be computationallyintensive for the surveillance systems to accurately track theparticular person and/or object by searching through the images.Therefore, improvements may be desirable.

SUMMARY

This summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DETAILEDDESCRIPTION. This summary is not intended to identify key features ofthe claimed subject matter, nor is it intended to be used as an aid indetermining the scope of the claimed subject matter.

An aspect of the present disclosure includes a method includingreceiving a first plurality of snapshots, generating a first pluralityof descriptors each associated with the first plurality of snapshots,grouping the first plurality of snapshots into at least one clusterbased on the plurality of descriptors, selecting a representativesnapshot for each of the at least one cluster, generating at least onesecond descriptor for the representative snapshot for each of the atleast one cluster, wherein the at least one second descriptor is morecomplex than the first plurality of descriptors, and identifying atarget by applying the at least second descriptor to a second pluralityof snapshots.

Aspects of the present disclosure includes a system having a memory thatstores instructions and a processor configured to execute theinstructions to receive a first plurality of snapshots, generate a firstplurality of descriptors each associated with the first plurality ofsnapshots, group the first plurality of snapshots into at least onecluster based on the plurality of descriptors, select a representativesnapshot for each of the at least one cluster, generate at least onesecond descriptor for the representative snapshot for each of the atleast one cluster, wherein the at least one second descriptor is morecomplex than the first plurality of descriptors, and identify a targetby applying the at least second descriptor to a second plurality ofsnapshots.

Certain aspects of the present disclosure includes a non-transitorycomputer readable medium having instructions stored therein that, whenexecuted by a processor, cause the processor to receive a firstplurality of snapshots, generate a first plurality of descriptors eachassociated with the first plurality of snapshots, group the firstplurality of snapshots into at least one cluster based on the pluralityof descriptors, select a representative snapshot for each of the atleast one cluster, generate at least one second descriptor for therepresentative snapshot for each of the at least one cluster, whereinthe at least one second descriptor is more complex than the firstplurality of descriptors, and identify a target by applying the at leastsecond descriptor to a second plurality of snapshots.

BRIEF DESCRIPTION OF THE DRAWINGS

The features believed to be characteristic of aspects of the disclosureare set forth in the appended claims. In the description that follows,like parts are marked throughout the specification and drawings with thesame numerals, respectively. The drawing figures are not necessarilydrawn to scale and certain figures may be shown in exaggerated orgeneralized form in the interest of clarity and conciseness. Thedisclosure itself, however, as well as a preferred mode of use, furtherobjects and advantages thereof, will be best understood by reference tothe following detailed description of illustrative aspects of thedisclosure when read in conjunction with the accompanying drawings,wherein:

FIG. 1 illustrates an example of an environment for implementing thehierarchical sampling for re-identification process in accordance withaspects of the present disclosure;

FIG. 2 illustrates an example of a method for implementing thehierarchical sampling for re-identification process in accordance withaspects of the present disclosure;

FIG. 3 illustrates an example of a method for implementing thehierarchical sampling for re-identification process includingclassification in accordance with aspects of the present disclosure;

FIG. 4 illustrates an example of a method for implementing thehierarchical sampling for re-identification process using neuralnetworks in accordance with aspects of the present disclosure; and

FIG. 5 illustrates an example of a computer system in accordance withaspects of the present disclosure.

DETAILED DESCRIPTION

The following includes definitions of selected terms employed herein.The definitions include various examples and/or forms of components thatfall within the scope of a term and that may be used for implementation.The examples are not intended to be limiting.

The term “processor,” as used herein, can refer to a device thatprocesses signals and performs general computing and arithmeticfunctions. Signals processed by the processor can include digitalsignals, data signals, computer instructions, processor instructions,messages, a bit, a bit stream, or other computing that can be received,transmitted and/or detected. A processor, for example, can includemicroprocessors, microcontrollers, digital signal processors (DSPs),field programmable gate arrays (FPGAs), programmable logic devices(PLDs), state machines, gated logic, discrete hardware circuits, andother suitable hardware configured to perform the various functionalitydescribed herein.

The term “bus,” as used herein, can refer to an interconnectedarchitecture that is operably connected to transfer data betweencomputer components within a singular or multiple systems. The bus canbe a memory bus, a memory controller, a peripheral bus, an external bus,a crossbar switch, and/or a local bus, among others.

The term “memory,” as used herein, can include volatile memory and/ornonvolatile memory. Non-volatile memory can include, for example, ROM(read only memory), PROM (programmable read only memory), EPROM(erasable PROM) and

EEPROM (electrically erasable PROM). Volatile memory can include, forexample, RAM (random access memory), synchronous RAM (SRAM), dynamic RAM(DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM),and direct RAM bus RAM (DRRAM).

The input to the hierarchical sampling re-identification system is a setof object tracks, where each track is a sequence of snapshots capturedacross consecutive frames of the video stream. Given this input, there-identification system may extract meta-data in the form ofdescriptors (also called visual features), which are arrays of numbersrepresenting the visual appearance of the object in each track. Atypical approach for is to extract a descriptor for each snapshot in thetrack, and store either all the descriptors or an aggregated descriptor(e.g., using average or max pooling) in the database. The resultingcollection of descriptors provides the necessary meta-data to laterperform re-identification. Typically, highly complex descriptors leadingto accurate re-identification tend to have higher computational costs,while less complex descriptors tend to have lower computational cost, atthe expense of providing lower re-identification accuracy.

If there are N snapshots in the track, the system extracts onedescriptor per snapshot, and the extraction cost per descriptor is C,then the total cost for the track is T=C*N. In order to reduce thiscost, the system can extract descriptors from only M snapshots in thetrack, where M<<N. In some instances, the higher the number ofdescriptors M, the more complete the description of the whole track, andthe higher the accuracy of the subsequent re-identification.

One aspect of the present disclosure includes how the system samples thebest M snapshots that, combined, provide the most complete descriptionof the object track. In general, the ideal sampling process firstclusters the snapshots in such a way that those with similarcharacteristics fall in the same cluster, and then picks a singlerepresentative snapshot per cluster, avoiding to extract descriptorsfrom redundant snapshots in the same cluster.

Clustering may rely on a similarity function that accurately comparesthe key visual appearance properties of snapshots, in order to put theones with same properties in the same cluster. Such similarity functionmay be obtained by comparing the descriptors among snapshots, where eachdescriptor summarizes the key visual properties of its correspondingsnapshot.

An aspect of the present disclosure includes a system that extractslower complexity descriptors for clustering, and then extractshigher-level complexity descriptors only from one snapshot per cluster.

Let C_(L) be the computational cost corresponding to the low-levelcomplexity descriptor (measured as processing time in seconds that ittakes to extract the descriptor for one snapshot), let C_(H) be thecomputational cost corresponding to the high-complexity descriptor, letN be the total number of snapshots in the current object track, let K bethe number of samples we pick after clustering, and let C_(c) be thecost of clustering. The total cost for the pipeline for a two levelhierarchical sampling is:

C_(T)=N*C_(L)+C_(C)+K*C_(H)

A generalization of the previous strategy may be obtained by addingintermediate layers of complexity. The lower complexity descriptor C_(o)may be applied to all the snapshots. Next, the system extractsdescriptors of intermediate complexity and discriminative power. As theintermediate layer is more discriminative than the lower layer, usingthe descriptors of intermediate complexity allows the system to furtherreduce the number of selected snapshots. Finally, the system extract thehighest complexity descriptors from this reduced set of K₁ snapshots.

Another example of sampling component that can be used to replace someof the layers of the pipeline is based on video segmentation. Thissampling works by first detecting changes across frames, then segmentingthe video into pieces of relatively constant content, and finallyselecting a single snapshot for each segment. Typically, the higher thecomplexity of the segmentation algorithm, the better the selection ofsnapshots, at the cost of a higher computational cost, leading again tothe same ideas exposed previously, and therefore allowing a similarhierarchical (multi-level) strategy.

As described in the previous paragraph, sampling layers of differentcomplexity can be obtained by extracting not only descriptors ofdifferent complexity, but also other types of metadata. Examples of thisare the number of the frame where the snapshot is found, the spatialcoordinates of the object in the frame, or the size of the object inpixels. For example, using the last two types of metadata, the systemmay cluster the snapshots by spatial position and size, so thatsnapshots that haven't moved much fall into the same cluster.

In some instances, the pipeline may include an additional componentwhich is a classifier. This component provides the class of object beingdescribed. Based on that, a class-specific descriptor can be extracted.For example, the system first utilizes the hierarchical sampling asdescribed in previous sections up to level L-1. As a result of thisprocess, the system may obtain a reduced number of sampled object imagesnapshots. For example, if the original process included only 3 levelsof complexity, then the system may apply level 0 and level 1, and mayobtain K₁ snapshots as a result of last level. In general, the systemwill obtain K_(L-1)snapshots after L-1 levels. Then, the system mayeither apply the classification component to all these K_(L-1) sampledsnapshots, or just apply it to a smaller subset of K′ snapshots (e.g.,by using clustering again). A single classification decision is obtainedby aggregating (e.g., averaging) the classification score obtained foreach of the K′ snapshots and selecting the class whose aggregated scoreis maximum. Once the class has been determined, class-specificdescriptors can be extracted from each of the K samples, in order toreduce the computational cost.

Many methods of clustering exist, including K-means, DBSCAN, GaussianMixture Models, Mean-Shift, and others. Also, different distances can beused, including Euclidean, Cosine distance, Mahalanobis, Geodesicdistance and others. Another clustering type is online clustering.

In another implementation, the system may select the next best snapshotthat has the highest quality in terms of re-identification. For example,some snapshots will have higher quality because the object has betterillumination, and therefore the details can be seen and describedbetter. Also, the alignment of the object in the snapshot is used forre-identification, as a bad aligned snapshot will be visually dissimilarto other snapshots corresponding to the same object.

In order to measure the quality of the snapshot for re-identificationpurposes, the system may measure the Mean Average Precision (MAP) of asnapshot. Snapshots with lower quality will tend to be more oftenconfused with snapshots from other objects, since the details are not soclear. On the other hand, snapshots with higher quality will tend tohave a more clear separation to snapshots from other objects, and thiscan be measured by the MAP metric.

In order to avoid having to compute this MAP metric based on actualcomparisons with a gallery of snapshots in the database, the system mayuse regression as a fast proxy. The idea is to train a “regressionmodel” that is able to estimate the MAP by looking at the snapshot. Atypical regression model is obtained by Neural Networks (NN). Because NNare also used to extract descriptors of high quality, the system maytrain a single network that provides both an estimated MAP score and adescriptor.

In order to avoid selecting multiple snapshots having similar MAPscores, after one snapshot is selected, the system may need to avoidconsidering all the snapshots whose similarity is higher than somepre-specified threshold. Using neural networks is just one possibility,as there are other regressors that can be used. The hierarchy comes fromthe fact that the first levels use fast and less accurate regressorswhich usually provide sub-optimal MAP estimation, so that we need toobtain more samples to compensate, while last levels obtain better MAPestimates which allows to narrow down the selection to fewer snapshots.

Referring to FIG. 1, an example of an environment 100 for performinghierarchical sampling for object re-identification may include a server140 that receives surveillance videos and/or images 112 from a pluralityof cameras 110. The plurality of cameras 110 may capture thesurveillance videos and/or images 112 of one or more locations 114 thatinclude people and/or objects (e.g., cars, bags, etc.).

In certain instances, the server 140 may include a communicationcomponent 142 that receives and/or sends data (such as the capturedsurveillance videos and/or images 112) from and to other devices, suchas a data repository 150. The server 140 may include an identificationcomponent 144 that performs the hierarchical sampling process for objectre-identification. The server 140 may include a classification component146 that classifies one or more images or objects in the images. Theserver 140 may include an artificial intelligence (AI) component 148that performs AI operations during the re-identification process.

In some implementations, the captured surveillance videos and/or imagesmay include snapshots (i.e., frames or portions of frames). For examplea one minute surveillance video and/or images may include 30, 60, 120,180, 240, or other numbers of snapshots. During the hierarchicalsampling process, the communication component 142 may receive thesurveillance video and/or images 112 from the plurality of cameras 110.The identification component 144 may perform the hierarchical samplingprocess for re-identification. The classification component 146 mayclassify an image or objects of the image. The AI component 148 mayperform filtering and/or representative snapshot selection process.

In certain aspects, the communication component 142 of the server 140may receive the surveillance video and/or images 112. The server 140 maygenerate and apply a first set of descriptors of low complexity (such ascolor, lighting, shape, etc) relating to a person or object to beidentified in the surveillance video and/or images 112. The applicationof the first set of descriptors to the surveillance video and/or images112 may cause the server 140 to group the snapshots of the person orobject to be identified in the surveillance video and/or images 112 intoseparate clusters. For example, by using a shape descriptor (shape ofpeople or objects), the server 140 may obtain three clusters: a firstcluster 120 (e.g., snapshots with varying standing postures of theperson), a second cluster 122 (e.g., snapshots with varying sittingpostures of the person), and a third cluster 124 (e.g., snapshots withvarying jumping postures of the person).

Next, in some instances, the server 140 may identify a representativesnapshot 120 a, 122 a, 124 a (described in further detail below) fromeach of the first, second, and third clusters 120, 122, 124. Therepresentative snapshots 120 a, 122 a, 124 a may include the leastnumber of background objects, having the most clear contrast, having thebest lighting, showing certain desired features, etc.

Next, in some examples, the server 140 may generate a second set ofdescriptors based on the representative snapshots 120 a, 122 a, 124 a.The second set of descriptors may include more complexity than the firstset of descriptors (e.g., including spatial information, timinginformation, class information, etc.).

In certain implementations, the server 140 may apply the second set ofdescriptors to the surveillance video and/or images 112 to identifyand/or locate a target, such as the person or object to be identified.

Turning to FIG. 2, an example of a method 200 for performinghierarchical sampling for re-identification may be performed by theserver 140 and one or more of the communication component 142 and/or theidentification component 144.

At block 202, the method 200 may start the hierarchical sampling for are-identification process.

At block 204, the method 200 may set the counters i and j to 0. Thecounter i may represent the number of iterations of selectingdescriptors and clustering snapshots. The counter j may represent thenumber of tracks (e.g., groups of videos and/or images). For example,the identification component 144 of the server 140 receivingsurveillance videos and/or image from nine cameras of the plurality ofcameras 110 may have a j value of “9” (1 track from each camera).

At block 206, the method 200 may input snapshots of track(i) into a poolP. For example, the identification component 144 may input a portion ofthe surveillance videos and/or images 112 into a pool P.

At block 208, the method 200 may generate a descriptor of complexity C,for each snapshot in the pool P. The descriptor of complexity C₀ mayhave lower complexity than the descriptor of complexity C₁, thedescriptor of complexity C₁ may have lower complexity than thedescriptor of complexity C₂, and so forth and so on.

At block 210, the method 200 may determine if i=L, where L is the numberof levels of complexities. If the identification component 144 of theserver 140 determines that i<L, then the identification component 144may move onto block 212.

At block 212, the method 200 may group snapshots into K clusters. Forexample, the identification component 144 of the server 140 may groupthe surveillance videos and/or images 112 into three clusters: thefirst, second, and third clusters 120, 122, 124.

At block 214, the method 200 may select a snapshot per cluster. Forexample, the identification component 144 of the server 140 may selectthe snapshots 120 a, 122 a, 124 a for each of the first, second, andthird clusters 120, 122, 124.

At block 216, the method 200 may input the selected snapshots into apool P′. For example, the identification component 144 may input thesnapshots 120 a, 122 a, 124 a into the pool P′.

At block 218, the method 200 may increment the counter i by one and setthe pool P to be equal to the pool P′. For example, the identificationcomponent 144 may increment the counter i and set the pool P to P′.

In some implementations, the method 200 may iteratively perform some orall of the steps between blocks 208 and 218 until, at block 210, theidentification component 144 of the server 140 determines that i=L. Ifthe identification component 144 of the server 140 determines that i=L,then the identification component 144 may move onto block 220.

At block 220, the method 200 may inject the descriptors of complexity CLinto a database, such as the data repository 150. For example, theidentification component 144 may apply the descriptors of complexity CL(e.g., C₁ for 1 level, C₂ for 2 levels, etc.) on the surveillance videosand/or images 112 in the server 140 or the data repository 150.

At block 222, the method 200 may determine if j=M, where M is the numberof tracks. If the identification component 144 of the server 140determines that j<M, then the identification component 144 may move ontoblock 224.

At block 224, the method 200 may increment the counter j by 1. Forexample, the identification component 144 may increment the counter j by1.

In some implementations, the method 200 may iteratively perform some orall of the steps between blocks 208 and 222 until, at block 222, theidentification component 144 of the server 140 determines that j=M. Ifthe identification component 144 of the server 140 determines that j=M,then the identification component 144 may move onto block 226 toterminate the method 200.

Turning now to FIG. 3, an example of a method 300 for performinghierarchical sampling for re-identification including classification maybe performed by the server 140 and one or more of the communicationcomponent 142, the identification component 144, the classificationcomponent 146 and/or the AI component 148.

At block 302, the method 300 may perform a hierarchical sampling processfor re-identification (with or without classification) for L-1 levels asdescribed above.

At block 304, the method 300 may group snapshots in K′ clusters asdescribed above.

At block 306, the method 300 may select a snapshot per cluster. Forexample, the identification component 144 may select K′ snapshots forthe K′ cluster based on the quality of the snapshot as described above.

At block 308, the method 300 may classify the selected snapshots. Forexample, the identification component 144 and/or the classificationcomponent 146 may classify the selected snapshots based on one or moreclassification algorithms as described above. During the classificationprocess, each of the selected snapshot may be assigned a plurality ofclassification scores associated with a plurality of classes (e.g.,person class, car class, building class, object class, etc.). In onenon-limiting example, a first snapshot may be assigned classificationscores of (car-1, person-5, building-2), and a second snapshot may beassigned classification scores of (car-0, person-4, building-0).

At block 310, the method 300 may aggregation the classification score.For example, the identification component 144 and/or the classificationcomponent 146 may aggregate the corresponding classification scores forthe K′ snapshots as described above. For example, the aggregated scoresfor the example above is (car-1, person-9, building-2).

At block 312, the method 300 may determine a class C based on theaggregated classification scores. For example, the identificationcomponent 144 and/or the classification component 146 may determinethat, given the aggregated scores of (car-1, person-9, building-2), theclassification for the corresponding cluster is a person as describedabove.

At block 314, the method 300 may generate K class-specific descriptorsof class C with complexity Ct. For example, the identification component144 and/or the classification component 146 may generate class-specificdescriptors of the person class with complexity CL as described above.

At block 316, the method 300 may inject the K class-specific descriptorsof complexity CL into the database. For example, the identificationcomponent 144 may apply the class-specific descriptors of complexity CL(e.g., C₁ for 1 level, C₂ for 2 levels, etc.) on the surveillance videosand/or images 112 in the server 140 or the data repository 150 asdescribed above.

Turning now to FIG. 4, an example of a method 400 for performinghierarchical sampling using neural networks for re-identification may beperformed by the server 140 and one or more of the communicationcomponent 142 and/or the identification component 144.

At block 402, the method 400 may start the hierarchical sampling forre-identification process as described above.

At block 404, the method 400 may set the counters i and j to 0. Thecounter i may represent the number of iterations of selectingdescriptors and clustering snapshots. The counter j may represent thenumber of tracks (e.g., groups of videos and/or images). For example,the identification component 144 of the server 140 receivingsurveillance videos and/or image from nine cameras of the plurality ofcameras 110 may have a j value of “9” (1 track from each camera) asdescribed above.

At block 406, the method 400 may input snapshots of track(i) into a poolP. For example, the identification component 144 may input a portion ofthe surveillance videos and/or images 112 into a pool P as describedabove.

At block 408, the method 400 may select, using a network N_(i),snapshots from the pool P with the highest estimated mean averageprecisions (MAPs) and put into a pool P′. For example, the AI component148 may use a neural network N, to select snapshots having the highestMAP for re-identification purpose. In one example, the AI component 148may use regression as a fast proxy as described above. The AI component148 may train a “regression model” that estimates the MAP by examining asnapshot. The snapshot with the highest MAPs may be snapshots with thehighest qualities for re-identification (e.g., good illumination, highlevel of details, good alignment and/or orientation).

At block 410, the method 400 may determine whether P′=K_(i), where K isa predetermined number associated with the number of clusters. If theidentification component 144 determines that P′≠ the identificationcomponent 144 may proceed to block 412 as described above.

At block 412, the method 400 may remove all snapshots having similarityindices above a threshold, wherein the similarity indices are associatedwith resemblance to the selected snapshot from P. For example, theidentification component 144 may remove all snapshots having similarityindices above a threshold, wherein the similarity indices are associatedwith resemblance to the selected snapshot from P as described above. Ina non-limiting example, two images that look “similar” (e.g., samepeople/object, same background, taken within half of a second from eachother, etc.) may have high similarity indices.

Next, the method 400 may iteratively perform some or all of the stepsbetween blocks 406 and 412 until, at block 410, the identificationcomponent 144 of the server 140 determines that P′=K_(i). If theidentification component 144 of the server 140 determines that P′=K_(i),then the identification component 144 may move onto block 414.

At block 414, the identification component 144 of the server 140 mayincrement the counter i by one and set the pool P to be equal to thepool P′. For example, the identification component 144 may increment thecounter i and set the pool P to P′.

At block 416, the method 200 may determine if i=L, where L is the numberof levels of complexities. If the identification component 144 of theserver 140 determines that i<L, then the identification component 144may move back to block 408 again.

In some implementations, the method 400 may iteratively perform some orall of the steps between blocks 408 and 416 until, at block 416, theidentification component 144 of the server 140 determines that i=L. Ifthe identification component 144 of the server 140 determines that i=L,then the identification component 144 may move onto block 418.

At block 418, the method 400 may inject the descriptors of complexityC_(L), into a database, such as the data repository 150, using theneural network NL. For example, the identification component 144 mayapply the descriptors of complexity C_(L), (e.g., C₁ for one level, C₂for two levels, etc.) on the surveillance videos and/or images 112 inthe server 140 or the data repository 150 using the neural networktrained at block 408 as described above.

At block 420, the method 400 may determine if j=M, where M is the numberof tracks. If the identification component 144 of the server 140determines that j<M, then the identification component 144 may move ontoblock 422.

At block 422, the method 400 may increment the counter j by one. Forexample, the identification component 144 may increment the counter j byone as described above.

In some implementations, the method 400 may iteratively perform some orall of the steps between blocks 406 and 422 until, at block 422, theidentification component 144 of the server 140 determines that j=M. Ifthe identification component 144 of the server 140 determines that j=M,then the identification component 144 may move onto block 424 toterminate the method 400.

Aspects of the present disclosures may be implemented using hardware,software, or a combination thereof and may be implemented in one or morecomputer systems or other processing systems. In an aspect of thepresent disclosures, features are directed toward one or more computersystems capable of carrying out the functionality described herein. Anexample of such the computer system 500 is shown in FIG. 5. In someexamples, the server 140 may be implemented as the computer system 500shown in FIG. 5. The server 140 may include some or all of thecomponents of the computer system 500.

The computer system 500 includes one or more processors, such asprocessor 504. The processor 504 is connected with a communicationinfrastructure 506 (e.g., a communications bus, cross-over bar, ornetwork). Various software aspects are described in terms of thisexample computer system. After reading this description, it will becomeapparent to a person skilled in the relevant art(s) how to implementaspects of the disclosures using other computer systems and/orarchitectures.

The computer system 500 may include a display interface 502 thatforwards graphics, text, and other data from the communicationinfrastructure 506 (or from a frame buffer not shown) for display on adisplay unit 550. Computer system 500 also includes a main memory 508,preferably random access memory (RAM), and may also include a secondarymemory 510. The secondary memory 510 may include, for example, a harddisk drive 512, and/or a removable storage drive 514, representing afloppy disk drive, a magnetic tape drive, an optical disk drive, auniversal serial bus (USB) flash drive, etc. The removable storage drive514 reads from and/or writes to a removable storage unit 518 in awell-known manner. Removable storage unit 518 represents a floppy disk,magnetic tape, optical disk, USB flash drive etc., which is read by andwritten to removable storage drive 514. As will be appreciated, theremovable storage unit 518 includes a computer usable storage mediumhaving stored therein computer software and/or data. In some examples,one or more of the main memory 508, the secondary memory 510, theremovable storage unit 518, and/or the removable storage unit 522 may bea non-transitory memory.

Alternative aspects of the present disclosures may include secondarymemory 510 and may include other similar devices for allowing computerprograms or other instructions to be loaded into computer system 500.Such devices may include, for example, a removable storage unit 522 andan interface 520. Examples of such may include a program cartridge andcartridge interface (such as that found in video game devices), aremovable memory chip (such as an erasable programmable read only memory(EPROM), or programmable read only memory (PROM)) and associated socket,and other removable storage units 522 and interfaces 520, which allowsoftware and data to be transferred from the removable storage unit 522to computer system 500.

Computer system 500 may also include a communications circuit 524. Thecommunications circuit 524 may allow software and data to be transferredbetween computer system 500 and external devices. Examples of thecommunications circuit 524 may include a modem, a network interface(such as an Ethernet card), a communications port, a Personal ComputerMemory Card International Association (PCMCIA) slot and card, etc.Software and data transferred via the communications circuit 524 are inthe form of signals 528, which may be electronic, electromagnetic,optical or other signals capable of being received by the communicationscircuit 524. These signals 528 are provided to the communicationscircuit 524 via a communications path (e.g., channel) 526. This path 526carries signals 528 and may be implemented using wire or cable, fiberoptics, a telephone line, a cellular link, an RF link and/or othercommunications channels. In this document, the terms “computer programmedium” and “computer usable medium” are used to refer generally tomedia such as the removable storage unit 518, a hard disk installed inhard disk drive 512, and signals 528. These computer program productsprovide software to the computer system 500. Aspects of the presentdisclosures are directed to such computer program products.

Computer programs (also referred to as computer control logic) arestored in main memory 508 and/or secondary memory 510. Computer programsmay also be received via communications circuit 524. Such computerprograms, when executed, enable the computer system 500 to perform thefeatures in accordance with aspects of the present disclosures, asdiscussed herein. In particular, the computer programs, when executed,enable the processor 504 to perform the features in accordance withaspects of the present disclosures. Accordingly, such computer programsrepresent controllers of the computer system 500.

In an aspect of the present disclosures where the method is implementedusing software, the software may be stored in a computer program productand loaded into computer system 500 using removable storage drive 514,hard drive 512, or communications interface 520. The control logic(software), when executed by the processor 504, causes the processor 504to perform the functions described herein. In another aspect of thepresent disclosures, the system is implemented primarily in hardwareusing, for example, hardware components, such as application specificintegrated circuits (ASICs). Implementation of the hardware statemachine so as to perform the functions described herein will be apparentto persons skilled in the relevant art(s).

It will be appreciated that various implementations of theabove-disclosed and other features and functions, or alternatives orvarieties thereof, may be desirably combined into many other differentsystems or applications. Also that various presently unforeseen orunanticipated alternatives, modifications, variations, or improvementstherein may be subsequently made by those skilled in the art which arealso intended to be encompassed by the following claims.

What is claimed is:
 1. A method of identifying targets, comprising:receiving a first plurality of snapshots; generating a first pluralityof descriptors each associated with the first plurality of snapshots;grouping the first plurality of snapshots into at least one clusterbased on the plurality of descriptors; selecting a representativesnapshot for each of the at least one cluster; generating at least onesecond descriptor for each representative snapshot, wherein the at leastone second descriptor is more complex than the first plurality ofdescriptors; and identifying a target based on comparing the at leastsecond descriptor and a third descriptor.
 2. The method of claim 1,wherein the third descriptor is associated with a second plurality ofsnapshots or an input query.
 3. The method of claim 2, wherein the inputquery includes one or more arrays of numbers representing an intendedtarget.
 4. The method of claim 1, further comprising, prior to receivingthe first plurality of snapshots: receiving a second plurality ofsnapshots; generating a third plurality of descriptors each associatedwith the second plurality of snapshots, wherein the third plurality ofdescriptors are less complex than the first plurality of descriptors;grouping the second plurality of snapshots into a plurality of clustersbased on the third plurality of descriptors; and selecting a secondplurality of representative snapshots as the first plurality ofsnapshots.
 5. The method of claim 1, further comprising: classifying therepresentative snapshot for each of the at least one cluster;aggregating classification scores of the representative snapshot foreach of the at least one cluster; determining a class based on theaggregated classification scores; and wherein the at least onedescriptor is a class-specific descriptor.
 6. The method of claim 1,further comprising, prior to selecting the representative snapshot,estimating a mean average precision (MAP) for each snapshot in the atleast one cluster.
 7. The method of claim 6, wherein selecting therepresentative snapshot for each of the at least one cluster comprisesselecting a snapshot in the at least one cluster having a highestestimated MAP.
 8. The method of claim 6, wherein estimating a MAPcomprises using a neural network to estimate the MAP.
 9. Anon-transitory computer readable medium comprising instructions storedtherein that, when executed by a processor of a system, cause theprocessor to: receive a first plurality of snapshots; generate a firstplurality of descriptors each associated with the first plurality ofsnapshots; group the first plurality of snapshots into at least onecluster based on the plurality of descriptors; select a representativesnapshot for each of the at least one cluster; generate at least onesecond descriptor for each representative snapshot, wherein the at leastone second descriptor is more complex than the first plurality ofdescriptors; and identify a target based on comparing the at leastsecond descriptor and a third descriptor.
 10. The non-transitorycomputer readable medium of claim 9, wherein the third descriptor isassociated with a second plurality of snapshots or an input query. 11.The non-transitory computer readable medium of claim 10, wherein theinput query includes one or more arrays of numbers representing anintended target.
 12. The non-transitory computer readable medium ofclaim 9, further comprising instructions that, prior to receiving thefirst plurality of snapshots, cause the processor to: receive a secondplurality of snapshots; generate a third plurality of descriptors eachassociated with the second plurality of snapshots, wherein the thirdplurality of descriptors are less complex than the first plurality ofdescriptors; group the second plurality of snapshots into a plurality ofclusters based on the third plurality of descriptors; and select asecond plurality of representative snapshots as the first plurality ofsnapshots.
 13. The non-transitory computer readable medium of claim 9,further comprising instructions that cause the processor to: classifythe representative snapshot for each of the at least one cluster;aggregate classification scores of the representative snapshot for eachof the at least one cluster; determine a class based on the aggregatedclassification scores; and wherein the at least one descriptor is aclass-specific descriptor.
 14. The non-transitory computer readablemedium of claim 9, further comprising instructions that, prior toselecting the representative snapshot, cause to processor to estimate amean average precision (MAP) for each snapshot in the at least onecluster.
 15. The non-transitory computer readable medium of claim 14,wherein the instructions for selecting the representative snapshot foreach of the at least one cluster comprises instructions for selecting asnapshot in the at least one cluster having a highest estimated MAP. 16.The non-transitory computer readable medium of claim 14, wherein theinstructions for estimating a MAP comprises instructions for using aneural network to estimate the MAP.
 17. A system, comprising: memorythat stores instructions; and a processor configured to execute theinstructions to: receive a first plurality of snapshots; generate afirst plurality of descriptors each associated with the first pluralityof snapshots; group the first plurality of snapshots into at least onecluster based on the plurality of descriptors; select a representativesnapshot for each of the at least one cluster; generate at least onesecond descriptor for each representative snapshot, wherein the at leastone second descriptor is more complex than the first plurality ofdescriptors; and identify a target based on comparing the at leastsecond descriptor and a third descriptor.
 18. The system of claim 17,wherein the third descriptor is associated with a second plurality ofsnapshots or an input query.
 19. The system of claim 18, wherein theinput query includes one or more arrays of numbers representing anintended target.
 20. The system of claim 17, wherein the processor isfurther configured to, prior to selecting the representative snapshot,estimate a mean average precision (MAP) for each snapshot in the atleast one cluster using a neural network.